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(S) Method for surmising functional site in physiologically active polypeptide or polynucleotide. 

(57) A method is provided wherein based only on the amino acid sequence of an objective polypeptide or 
protein or the nucleotide sequence of an objective polynucleotide, its functional site or region such as 
cataJlytically active site or binding site can be surmised and generally be applied to a wide range of 
polypeptides or proteins or polynucleotides. 

(1) Plurality of polypeptides or proteins whose amino acid sequence and a functional site are already 
known (hereinafter referred to as "reference polypeptides") are selected, (2) the relations between the 
functional sites and amino acid sequences, common to these polypeptides are extracted, and a law is 
induced from the relations, and (3) the law is applied to a polypeptide the amino acid sequence of which 
is known but the functional site or region of which is hot known (test polypeptide). 
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BACKGROUND OF THE INVENTION 
Field of th Invention 

5 The present invention relates to a method for surmising a functional site or region of a physiologically active 

polypeptide, protein or a polynucleotide. If it is possible to surmise, with a high probability, a functional site or 
region of the physiologically active polypeptide, protein or polynucleotide, it is extremely industrially useful 
because such an information can give a deep insight when we want to modify a proper position or region of 
the polypeptide or proteins to construct more useful one with desired functions. This invention is particularly 

10 useful for creating novel and useful polypeptides or proteins or polynucleotides by such a method of M. Ballivet 
et al. (Literature 1 0) to modify their functional regions. 

Related Art 

15 Although various methods have been proposed to surmise the functional site or region of a physiologically 

active polypeptide or protein (Literatures 1 to 8 and 1 1 to 12), a method has not been known which can surmise 
the functional site or region of an objective polypeptide or protein using the information on the amino acid sequ- 
ence thereof alone and which can generally be applied to every polypeptide or protein existing in nature. 
Namely,1t is extremely difficult to surmise the catalyticaily active site of enzymes or the binding region of recep- 

20 tors or lymphokines or the like except such a method based on homologous sequences (homology) (Literatur s 
5 and 1 2). Further, since in proteins there are generally many functional sites or regions involved in the molecu- 
lar recognition and response in regions other than homologous sequences (homology), it is difficult to find them 
solely by the homologous sequences (homology). 

. 25 SUMMARY OF THE INVENTION 

The present invention aims at providing a method that can surmise the functional site or region of an objec- 
tive polypeptide or protein or polynucleotide based on the amino acid sequence or nucleotide sequence alone, 
and moreover can generally be applied to a wide range of polypeptides or proteins or polynucleotides. 

30 The above problem is solved by the method of this invention which comprises (1) selecting plural polypep- 

tides or proteins (hereinafter referred to as "reference polypeptides") whose amino acid sequence and active 
sites have been already confirmed, (2) extracting the relations between the active sites and the amino acid 
sequences, common to these reference polypeptides, and deriving a law from the relations, and (3) applying 
the law to a polypeptide (hereinafter reffered to as "test polypeptide") whose amino acid sequence is known 

35 but whose the functional site or region is not known. 

This method can also be used for surmising the functional site or region of a polynucleotide such as the 
ribozyme which we called. 

BRIEF DESCRIPTION OF DRAWINGS • 

40 

Fig. 1 denotes a surmised pattern of the functional region of bovine pancreas ribonuclease. 

Fig. 2 denotes a surmised pattern of the functional region of canine pancreas cation ic trypsinogen. 

Fig. 3 denotes a surmised pattern of the functional region of rat pre procarboxypeptidase A. 

Fig. 4 denotes a surmised pattern of the functional region of human carbonic anhydrase I. 
45 Fig. 5 denotes a surmised pattern of the functional region of human Cu/Zn superoxide dismutase. 

Fig. 6 denotes a surmised pattern of the functional region of chicken triosephosphate isomerase. 

Fig. 7 denotes a surmised pattern of the functional region of human type I alcohol dehydrogenase. 

Fig. 8 denotes a surmised pattern of the functional region of Escherichia coli glutathione reductase. 

Fig. 9 denotes a surmised pattern of the functional region of rat liver catalase. 
so Fig. 10 denotes a surmised pattern of the functional region of bovine lysozyme C. 

Fig. 1 1 denotes a surmised pattern of the functional region of Kiwi fruit actinidine. 

Fig. 12 denotes a surmised pattern of the functional region of human glyceraldehyde-3-phosphate dehyd- 
rog nase. 

Fig. 13 denotes a surmised pattern of the functional region of bovine pancr as phospholipase A2. 
55 Fig. 14 denotes a surmised patt m of the functional region of B. st arothermophilus thermoiysin. 

Fig. 1 5 denotes a surmised pattern of th catalyticaily functional region of tobacco ring spot virus (TRSV). 
Where sum m ans the probabilities of the functional site such as the catalyticaily active site or binding site 
of proteins. 
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DETAILED DESCRIPTION 

Physiologically active polypeptides or proteins in this invention indud all peptide substances having some 
physiological activity, and comprehensive substances such as enzymes, hormones and various interieukins 
are included. An functional site means a site consisting of one or a small number of amino acids required for 
these physiologically active polypeptides to express their functions, and for example, in enzymes their catal- 
lytical active site is meant and in other polypeptides the binding site is meant. Further, an functional region 
means a region which is involved in the expression of the physiological activities of a physiologically active 
polypeptide and has some extent. 

This invention is specifically described below. 

The "reference polypeptide" in the invention can be any polypeptide whose amino acid sequence and the 
functional sites are known. The number thereof is not particularly critical and it is thought that the higher the 
number, the more accurately a method for surmising the functional site can be, and about 10 reference polypep- 
tides more are sufficient. In the specific examples of the invention, 17 enzymes shown in Table 1 (Literature 
7) were used. 
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' Table 1 
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Enzyme 


Length of 


Active 




ID name 




No. 


name 


the amino 
acid sequence 


site 




of PDB 


10 


1 


Ribonuc lease 


124 


12, 41*, 


119 


1RN3 




2 


Trypsinogen 


223 


40, 84, 


117 


2PTN 




3 


Carboxypeptidase A 


307 


* 

127, 248 , 


270 


5 CPA 


15 


4 


Carbonic anhydrase I 


260 


106, 


199 


2 CAB 




5 


L-lactate 
dehydrogenase 


329 


99. 169. 


193 


4LDH 


20 


6 


Superoxide dismutase 


151 


61. 


141 


2S0D 




7 


Penicillopepsin 


323 


33 , 


213 


2APP 




8 


Trios ephosphate 


247 




164 


1TIM 


25 




i some rase 












9 


Alcohol 
dehydrogenase 


374 


48 , 


51 


4ADH 


30 


10 


Glutathione reductase 


478 


58, 63. 


467 


2GRS 




11 


Rhodanese 


293 


186, 247, 


249 


1RHD 




12 


Catalase 


506 


74 , 


147 


8 CAT 


35 
















13 


Lysozyme C 


130 


35 , 


53 


1LZI 




14 


Actinidine 


220 


25 , 


162 


2ACT 


40 


15 


Glyceraidehyde 
3-phospha te 
dehydrogenase 


333 


148 , 


175 


1GPD 




16 


Prophospholipase A2 


130 


55 , 


106" 


2BPZ 


45 


17 


Thermolysin 


316 


143 , 


231 


3TLN 




The 


position * was quoted 


from 3I0N/ swiss 


-prot database. 
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Th n, in th amino acid sequence of reference polypeptid s selected in the above way, a r gion (suppos d 
functional region) surrounded by plural amino acids upstream and downstr am from the position of a catallyticat 
55 active sit ar xtracted. The number of amino acids in th supposed function region are not particularly limit d, 
and for example, about 15 is sufficient 

Then, amino acid sequenc patterns represented by the following formula (I): 

Xi(Z)„X 2 (I) 
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are supposed using 20 amino acids existing in natural proteins. This amino acid s quence pattern is referred 
to as "supposed amino acid sequence pattern" in the invention. In the formula (I), X t and X 2 are the sam or 
different, and specific amino acids selected from the 20 amino acids. 

In this case, although it is desirable to use as amino acids all of the 20 amino acids usually existing in pro- 
5 teins, namely glycine (Gly; G), alanine (Ala; A), valine (Val; V), leucine (Leu; L), isoleucine (lie; l) t Serine (Ser; 
S), threonine (Thr; T), aspartic acid (Asp; D), glutanic acid (Glu; E) f asparagine (Asn; N), glutamine (Glu; Q), 
lysine (Lys; K), arginine (Arg; R), cysteine (Lys; C), methionine (Met; M) ? phenylalanine (Phe: F), tyrosine (Tyr; 
Y), tryptophan (Trp; W), histidine (His; H) and proline (Pro; P), it is possible to omit a small number of amino 
acids whose occuring frequencies in natural proteins is very low and which do not have a significant influence 
10 on the method of the invention. 

Z, are the same or different, and mean any unspecified amino acids in the above amino acids, n is 0 or an 
integer of 1 to above 30. Although the upper limit of n is not particularly critical, 30 is sufficient. The amino acid 
Xt is referred to as "reference amino acid" for the sake of descriptive convenience. 

Then, plural or all amino acid sequence patterns satisfying the requisite of the formula (I) are supposed. 
75 For example, when it is supposed that X! and X 2 are selected from 20 amino acids and the number of n, i.e., 
the number of an unspecific amino acid Z, moves from 0 to 30, the amino add sequence patterns of 20 x 20 x 
31 = 12,400 can be obtained. 

In the above, however, the upper limit of n is not necessarily 30, and for example, when the upper limit of 
n is 2, namely n is 0, 1 or 2, the number of the supposed amino acid sequence patterns becomes 20 x 20 x 3 
20 = 1,200. Thus, sequence patterns such as, for example, Gly Gly, Gly Ala, Gly Val, ... (n = 0); Gly Z Gly, Lys Z 
Ala, Gly Z Val, ... (n = 1); and Gly Z Z Gly, Gly Z Z Ala, Gly Z Z Val, ... (n = 2) are constructed. In the above Z 
may be any of the unspecified 20 amino acids. 

Then, the reference amino acid sequences is scanned by each of these specified amino acid sequence 
pattern while the latter is moved along with the former. The sum of the frequency such that the X^ and X 2 of 
25 the supposed amino acid sequence pattern agree with the amino acids at the positions corresponding thereto 
in the reference amino acid sequences is determined. The supposed amino acid sequence patterns in both 
the "supposed functional regions" and the region other than the functional regions (this is referred to as "sup- 
posed non-functional region") in the reference amino acid sequences are divided. 

This operation is carried out on, for example, 1 0 or more of reference amino acid sequences. Although the 
30 total number of the supposed amino acid sequence patterns used in this operation is not particularly restriced, 
20 x 20 x 13 or more (i.e., n ^ 12) is desirable. 

The above operation, to our surprise, gave present inventors a fact that the used supposed amino acid 
sequence patterns could be classified into ones specifically according with or favoring for the supposed func- 
tional regions of the reference amino acid sequences, and the others specifically according with or favoring for 
35 the supposed non-functional regions thereof (incidentally, there were also a small number of supposed amino 
acid sequence patterns not specific for both regions). 

When, as a typical example, supposed amino acid sequence patterns with the occurring frequencies more 
than 20 times were selected from the amino acid sequences of the 17 reference polypeptides under the con- 
ditions of 20 x 20 x 13 supposed amino acid sequence patterns (n = 0 to 12), supposed amino acid sequence 
40 patterns whose reference amino acid X^ is Cys, His, Arg, Ala, Gly, Ser, Thr or Asn were favored for the supposed 
functional regions, and on the other hand, specified amino acid sequence patterns whose reference amino acid 
X! is Asp, Glu, Phe, Lys, Leu, Pro or Tyr were favored for the supposed non-functional regions. 

Some of supposed amino acid sequence patterns corresponding to reference amino acids X, could not be 
classified into either a region by the above classification criterion, because of a low occurring frequency of the 
45 same amino acid. 

Then, with respect to the supposed amino acid sequence patterns classified into any of the supposed func- 
tional regions and the supposed non-functional regions, the supposed amino acid sequence patterns having 
the same specified reference amino acid X, are individually collected in each region, and then superposed the 
reference amino acid to align one amino acid sequence pattern. These are particularly related to either the sup- 
so posed functional regions or the supposed non-functional regions. Therefore, these are referred to as "related 
amino acid sequence pattern". 

By this operation, the following "related amino acid sequence patterns" were obtained in the above specific 
examples of the inv ntion. Namely, as functional region-related amino acid sequence patterns: 

55 
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1. Cvs (Gly/Ser) Ala Z Gly Z (Gly/Ala) (SEQ ID:1) 

2. His Z Z Ser (Gly/Ser) Z Gly Z Gly Z Gly 
(SEQ ID: 2) 

5 3. Axg Ser Z Z Z Ala Z Ser Z Z (Gly/ Ala/Ser) 

{ SEQ ID:3) 

4- Ala Z Ser Z Z Z Z Gly Z Z Gly Z Ala Z 
w (SEQ ID:4) 

5. Gly (Gly/Ser) Z Z Z Ser Gly Ala Ser Gly 
(Ala/Gly) Gly (Gly/Ser) (SEQ ID: 5) 

6. Ser (Gly/Ser) Ala Ser Z Gly Ser Z Gly Z Ser Z 
Gly (SEQ ID: 6) 

7. Thr . (Gly/Ala) Ala Z Ala (Ala/Ser) Z Ala 
(Ala/Ser) (Gly/Ser) (SEQ ID: 7) 

8. Asn Z Z Ser (Gly/Ala) Ala Z (Ser /Ala) Gly 
(Ser/Ala) Z Z Ser (SEQ ID: 8) 



15 



20 



30 



As non-functional region-related amino acid sequence patterns 

25 

9. Asp Pro (Leu/Lys) Z Leu Z Z Val Lys Leu Leu 
Leu (Gly/Lys) (SEQ ID: 9) 

10. Glu Z Z Z Z Lys Z Leu Leu (Asp/Leu) (Gly/ Asp) 
2 (Gly/Lys) (SEQ ID:10) 

11. Phe (Gly /Leu) Asp Asp (Gly/Val) Z (Val/Asp) 
Val Z Z (Gly/Leu) Val Gly Asp (SEQ ID: 11) 

12. Lys Leu (Asp/Lys) Z Z Gly Leu Gly (Asp/Lys) 
(Val/Leu) Gly Leu Asp Leu (SEQ ID: 12) 

13. Leu Lys Asp Z (Gly/Asp) Z Z Z Leu (Asp/Lys) 
(Asp/Val) (Asp/Lys) (Val/Leu) (Lys/Val) 
(SEQ ID:13) 

14. Pro Asp Lys Gly Z Z Z Z (Val/Gly) (Gly/Lys) 
Lys Val (Gly/Lys) Leu (SEQ ID: 14) 

45 15. Tvr Val Z Leu (Val/Leu) Val Z Z Leu Z Asp 

(SEQ ID: 15) 



35 



40 



In the above related amino acid sequence patterns, the underlined amino acids are reference amino acids 

50 X v 

Either amino acid deviced by the slash in the parentheses represents one of X 2 . For example His can pair 
with Gly or Ser in (SEQ !Q:2), namely His ZZZ Gly or His 777 Ser. 

Then, it is nec ssary to pre par a "discrimination criterion" in order to surmis , utilizing th "related amino 
acid sequence patterns", the active sit s of a polypeptide (test polypeptide) whose amino acid sequence is 
55 known but whose activ sites are not known. 

This discrimination criterion is induced by applying, to a reference amino acid s quence whose active site 
is already known, the above "related amino acid sequence patterns", namely the functional region-relat d 
amino acid sequenc patterns or non-functional region-related amino acid sequence patterns, or both of them. 
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As a result a condition by which the activ site in the reference amino acid sequence is most certainly related 
to th related amino acid sequ nc can b induced. 

For example, based on the related amino acid sequence patterns specific for the supposed functional reg- 
ion, is prepared a symmetrically related amino acid sequence pattern situated the reference amino acid at the 
5 center (this is referred to as a "symmetrically related amino acid sequence pattern"). For example, when the 
related amino acid sequence pattern (1) is 

Cvs (Gly/Ser) Ala Z Gly Z (Gly/Ala), 
10 (SEQ ID: 16), 

its symmetrical related amino acid sequence pattern is 

(Gly/Ala) Z Gly Z Ala (Gly/Ser) Cys (Gly/Sex) 
75 Ala Z Gly Z (Gly/Ala) (SEQ ID: 17). 

Such a "symmetrically related amino acid sequence pattern" can be prepared from each related amino acid 
sequence pattern. 

20 Then, this "symmetrically related amino acid sequence pattern" moved from N-terminal to C-terminal of the 

reference amino acid sequence to count the sum of number of amino acid overlapped completely. For example, 
in the range respectively of 5 to 10 amino acids upstream and downstream of the reference amino acid, the 
number of "the same amino acids", which are amino acids other than the reference amino acid, can be counted 
by the comparison of the "symmetrically related amino acid sequence pattern" with the reference amino acid 

25 sequence. 

This operation is carried out on each symmetrically related amino acid sequence pattern. For specification 
of the active site in the reference sequence, the requisite of "the number of the same amino acids" and "the 
number of symmetrically related amino acid sequence patterns" are determined. 

For example, in a typical example of the invention, the above requisites were determined as follows. 

30 

The first requisite: 

A site or region is that, in the range of 10 amino acids in both upstream and downstream of the reference 
amino acid, "the number of the same amino acids" on "symmetrically related amino acid sequence patterns" 
35 whose reference amino acids are Cys, His, Arg, Ala, Gly, Ser ( Thr and Asn are more than 2, 2, 2, 2, 4, 3, 3 and 
3 respectively, and that the number of "symmetrically related amino acid sequence patterns" satisfying the 
requisite is 3 or more; 

The second requisite : 

40 

A site or region is that in the range of 10 amino acids in both upstream and downstream of the reference 
amino acid, "the number of the same amino acids" on "symmetrically related amino acid sequence patterns" 
whose reference amino acids are Cys, His, Arg, Ala, Gly, Ser, Thr and Asn are 2, 2, 2, 2, 4, 3, 3 and 3 respect- 
ively, and that the number of "symmetrically related amino acid sequence patterns" satisfying the requisite is 
45 2 or more, and that in the ranges of 5 amino acids in both upstream and downstream of the reference amino 
acid, "the number of the same amino acids" of "symmetrically related amino acid sequence patterns" containing 
Cys as the reference amino acid is 3 or more, or two more Cysteines exist in the above range. 

The third requisite : 

50 

A site or region is that in the range of 10 amino acids in both upstream and downstream of the referenc 
amino acid, "the number of the same amino acids" on "symmetrically related amino acid sequence patterns" 
whose reference amino acids are Cys, His, Arg, Ala, Gly, S r, Thr and Asn ar 2, 2, 2. 2, 4, 3, 3 and 3 r sp ct- 
ively, and that th number of "symmetrically related amino acid sequence patterns" is 2 or mor , and that in 
55 the range of 5 amino acids in both upstr am and downstream of th reference amino acid, non of "the number 
of th same amino acids" on "symmetrically related amino acid sequence patterns" whose reference amino 
acids are Asp, Giu, Phe, Lys, Leu, Pro and Tyr are composed of 2, 2, 3, 4, 3, 3 and 2 respectiv ly. 
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The fourth requisite : 

A site or region is that in the range of 10 amino acids in both upstream and downstream of th reference 
amino acid, "the number of same amino acids" of the "symmetrically r lat d amino acid sequence patterns" 
whose reference amino acids are Cys, His, Arg t Ala, Gly, Ser, Thr and Asn, is not 0, and that such patterns 
emerge in a number of 3 or more in a row; 

The fifth requisite: 

A site or region is that in the range of 10 amino acids in both upstream and downstream of the reference 
amino acid, "the number of the same amino acids" of the "symmetrically related amino acid sequence patterns'* 
whose reference amino acids are Cys, His, Arg, AJa, Gly, Ser, Thr and Asn respectively, is not 0, and "the num- 
ber of the same amino acids" of the "symmetrically related amino acid sequence patterns" whose reference 
amino acids are Asp, Glu, Phe, Lys, Leu, Pro and Tyr respectively, is not 0, and that such patterns emerge in 
a number of 5 or more in a row; 

The sixth requisite: 

A site or region is that in the range of 10 amino acids in both upstream and downstream of the reference 
amino acid, "the number of the same amino acids" of the "symmetrical related amino acid sequence patterns" 
whose reference amino acids are Asp, Glu, Phe, Lys, Leu, Pro and Tyr, is 0, and that such patterns concentrate 
or these amino acids. 

In order to surmise an active site in the test polypeptide, first, the "symmetrically related amino acid sequ- 
ence patterns* is superposed on the amino acid sequence of the test polypeptide, and that in the range of 5 to 
10 amino acids in both upstream and downstream of the reference amino acid, "the number of the same amino 
acids", which are amino acids other than the reference amino acid, are counted. 

By applying the above first requisite to this result a functional site or region in the amino acid sequence 
of the test polypeptide is surmised. Consequently, when a region of a reasonably narrow range can not be speci- 
fied, the second condition can be then applied. If necessary, the third condition, the fourth condition, etc. can 
be applied in this order, and the above operation can be repeated until an action region of a reasonably narrow 
range can be surmised. 

Incidentally, the above "related amino acid sequence patterns" are classified into ones related to the func- 
tional region and the others related to the no n -functional region, and the amino acid X n and the amino acid X 2 
are extracted on both regions to obtain the following results. 
Supposed functional region-related patterns 
X, ... Ala, Cys, Gly, His, Asn, Arg, Ser, Thr 
X 2 ... Gly, Ala, Ser 

Supposed non-functional region-related patterns 

Xi ... Asp, Glu, Phe, Lys, Leu, Pro, Tyr 

X 2 ... Leu, Lys, Gly, Asp, Val 
The inventors investigated the relation between the above X-j and X 2 from a view of new point. For example, 
when Ala (alanine) of X 2 of the functional region-related patterns is directed to, one of the codons (Although a 
codon usually means one having as a unit a sequence consisting of three of four kinds of bases composing a 
mRNA, this is conveniently applied to DNA in this specification. This is also the case with "sense" and "anti- 
sense".) is "GCT". 

The antisense against this codon is "CGA", and when this antisense is read in the direction of 5'-» 3', "AGC" 
is obtained whereby Ser (serine) is coded. Further, when "GCT", a codon of Ala is read in the reverse direction 
(3' -> 5'), TCG" is obtained whereby Ser (serine) is encoded, and when the antisense "CGA" is read in the 
reverse direction (3' -> 5'), a codon encoding Arg (arginine) is obtained. All these amino acids (Ser and Arg) 
exist in the amino acids of X, of the functional region-related patterns. 

Similar relations exist on Ser of X 2 of the functional region-related patterns, too. Therefore, it is surmised 
that with respect to the amino acids of X, and the amino acids of X 2 , codons encoding them closely correspond 
to antisense. Consequently, by utilizing this relation, it is possibl to efficiently surmise an active site in the test 
polypeptide. 

Ther fore, th second method for surmising an functional site or region in the amino acid sequenc of a 
test polypeptide uses the above finding. 

For this purpose, at first, an amino acid sequence pattern represented by the following formula (II): 

Xi'(Z) m X 2 ' (II) 
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is supposed. This amino acid sequence pattern is ref rred to as a secondary amino acid sequence pattern for 
the sake of convenience. In the above formula (II), X/ and X 2 ' are specific amino acids selected from 20 amino 
acids naturally existing in proteins, and X/ and X 2 ' ar related to the above through their sense and antisense. 
In this sens , the above restriction requisite is essentially, different from supposed amino acid sequences 
5 denoted by X, (Z) n X 2 p.e., formula (I)]. Z* are the same or different unspecific amino acids, and m is 0 or an 
integer of 1 to about 30. However, since X^ and X 2 ' are related as above, it is sufficient to use m of 0 or 1 , i.e., 
X/ X 2 ' and X/2X 2 '. 

Then these secondary amino acid sequence patterns are applied to the reference amino acid sequences, 
multivariate analysis is carried out to lead to a discriminant function capable of discriminating an active site or 
10 region from other. 

Then, the thus obtained discriminant function can be applied to the amino acid sequence of the test 
polypeptide to surmise an active site or region in the amino acid sequence. 

As the first specific example of the invention, repetitive sequence patterns extracted from 160 kinds of sec- 
ondary amino acid sequence patterns coming under the formulae X-,' X 2 ' and X,' Z X 2 ' (Z isselected from 20 
15 kinds of amino acids), and other 102 kinds of patterns (the related amino acid sequence patterns, doublet sequ- 
ence patterns, in consideration of a substitution frequency constant, etc.) were prepared, and these were 
applied to the above 17 kinds of the reference amino acid sequences to derive a discriminant function for sur- 
mising an active site or region. 

In this connection, wtten this discriminant function was applied to the above reference amino acid sequ- 
20 ences whose active site was known, it was possible to distinguish active from non-active regions by a discrimi- 
nation percentage of 84%. Preferential amino acid sequence patterns used in the discriminant function are 
shown below. 



25 



30 



35 



Namely, "they are 1. Ala 2 Cys 2. Gly Ser 
3- Val Z His 4. Val His 5. Axg Pro 6. Thr Z Arg 
7. Val Tyr 8- Leu Asn 9. Gly Thr 10. Gly Z Ser 
11. Ala Arg 12. Ser Arg 13. Ser Ser 14. Leu Z Gin 
15. 2 X Z 2 > 1.8 16. Leu Z Asp 17. Val Gin 18. Met Z Tyr 
19. Met: Tyr 20. lie Z Asp 21. lie Asn 22. Thr Arg 
23. ab (X) ba, etc. 



(wherein Z is any of the unspecified 20 kinds of amino acids. Further, Z x and Z 2 are amino acid sequences 
sequenced in tandem in the amino acid sequences, and means a combination such that the total of their sub- 
stitution frequency values is 1.8 or more. The substitution frequency value was based on Literature 9. n is an 
integer of 0 to 30, and a and b are any of the 20 kinds of amino acids.) 

40 In order to verify the effectiveness of the method of the invention established as above, the method of the 

invention was applied to many polypeptides whose amino acid sequence and an active site were known 
(polypeptide other than the above reference polypeptides), and the resulting results were compared with the 
actual active sites identified. The results are shown in Table 2. 

Table 2 shows the results obtained on many enzymes, and therein, "hit rate" represent the ratio of the num- 

45 bers of actual but position and the number of position surmised by the method of the invention, and "recovery 
rate" represents the ratio of the number of actually recoveried active sites and the number of whole active sites 
existing in enzymes. Table 3 shows the results of the surmised efficiency in the case of active region (i.e., bind- 
ing region) of proteins other than enzymes. 
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Table 2 



o 




Various enzymes 


Number 

of 
enzyme 


Number 

of 
amino 
acid 


Hit 
rata 


Recovery 
rate 


10 


1) 


Hydrolase 














Serine protease 


9 


3600 


17/33 


17/27 


15 




Serine esterase 


4 


2027 


2/14 


2/6 






Aspartyl protease 


6 


2311 


8/14 


8/12 






Metalloprotease 


5 


2782 


1/21 


1/6 


20 




Thiolprotease 


5 


1986 


6/12 


6/10 






Glycosidase 


5 


3051 


1/22 


1/6 


25 




Carboxypeptidase 
Nuclease 


4 

2 . 


2073 
445 


4/13 
1/2 


4/7 
1/5 






0 the rs 


D 


7 *\ 0 L 




4 / 
*t / o 


30 


2) 


Lyase 


4 


1749 


5/13 


5/5 




3 ) 


Transferase 


6 


2463 


5/17 


5/8 




4 > 


Oxidoreductase 


5 


224 7 


5/13 


5/8 


35 


■ 5) 


Isomerase 


3 


1845 


2/14 


2/5 




6) 


Other 


4 


1606 


2/13 


2/5 


40 


7) 


Inhibiting substance 


2 


817 


2/4 


2/2 








70 


31526 


55/245 
27Z 


65/118 
552 


45 















50 
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Table 3 



5 




Physiologically 
active polypeptide 


Number of 
amino acid 


Hit rate 


Recovery 
rate 




1 


PAI-1 


402 


1/2 


1/1 


10 


2 


Angiotensinogen 


484 


1/3 


1/1 




3 


CD4 


458 


1/5 


1/1 


15 


4 


HIV envelope gpl20 


856 


0/4 


0/1 




5 


Poliovinus •capsid 


881 


1/5 


1/1 




6 


Acetylcholine receptor 


461 


2/4 


2/2 


20 


7 


Interleukin 2 


153 


1/1 


2/5 




8 


Insulin receptor 


1382 


2/10 


2/3 


25 


9 
10 


Insulin 
Calcitonin 


110 


2/2 


2/2 






(human) 


141 


V 1 


1/1 


30 




(salmon I) 


136 


1/1 


1/1 




11 


TNF-oc 


233 


1/3 


1/1 


35 






5697 


14/41 
34Z 


15/20 
75Z 



40 

As is apparent from Table 2 and 3, according to the method of the invention, 27% of the catalytically active 
sites actually existing in the enzymes were hit, and in other physiologically active peptides, 34% of the actually 
reported binding sites were hit. 

The above method can be applied to every physiologically active polypeptide whose amino acid sequence 

45 is known, and further, when the nucleic acid sequence to encode the amino acid sequence is known, it is also 
possible to search the amino acid sequence supposed when its sense sequence is read in the reverse direction 
of 3' ->• 5', and still further, it is also possible to surmise the active site or region by searching the amino acid 
sequence supposed when the antisense sequence against the code sequence is progressively read in the 
direction of 5' -> 3', or the amino acid sequence supposed when the antisense is read in the reverse direction 

50 of 3' -> 5'. 

In order to verify this point, out of 17 kinds of enzymes, the active site of 14 kinds of enzymes whose base 
sequence was en tried in BION/genbank database (in this connection, on the enzymes from different species, 
their activ sites were identified based on comparison with homologous sequence (Tabl 1)) was surmised 
under the conditions of investigating the amino acid sequences r ad in the dir ction of 5' — ► 3' (N; sense, 5' 
55 3'), 3' 5' (C; antisense, 3' 5'), 5' 3' (IC; antisense, 5' -» 3'). and 3' 5' (R; sens , 3' 5'). The 
r suits are shown in the following Table 4 to Table 6. 
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• 

As mentioned above, it is possible to surmise, even when the above amino acid sequences other than the 
natural amino acid sequences were used, the active sites with considerabf frequency. Further, it is readily 
thought that, by a combination of these four methods, it is possible to surmise the active sites or regions with 
a greater efficiency. 

5 In order to further make the surmise method of the invention more precise, the methods described above 

were repeated to obtain the following "related amino acid sequence patterns". 

Namely, as "supposed functional region-related amino acid sequence patterns": 



10 



15 



20 



25 



1. Cvs (Gly/Ser) Ala Z (Gly/Arg) Z (Gly/Ala) 
(SEQ ID: 18 ) 

2. His Z Z Ser (Gly/Ser) Z Gly Z Gly Z Gly 
(SEQ ID: 19) 

3. Arq Ser Z Z Cys Ala Z Ser Z Z ( Gly/Ala/Ser ) 
(Ser/Val) (SEQ ID:20) 

4. Ala Thr (Ser/Thr/Val/Cys ) Z (Thr/Leu/£sn) 
(Thr/Asn) Cys ( Gly/Thr/Asn ) (Thr/Glu) . 
(Ile/Asn) (Gly/Arg) (Glu/Gln) Ala 
(SEQ ID:21) 

5. Gly ( Gly/Ser /Thr ) Gin Tyr ( Asn/His /Cys ) 
(Ser/Pro/Phe) (Gly/His/Cys ) Ala 
(Ser/Ile/Asn) (Gly/Thr) ( Ala/Val/Gly/Axg/ 
His/Tyr) Gly (Gly/Ser) Val (SEQ ID: 22) 

6. Ser (Gly/Lys/Glu/Cys/Ser/Arg) Ala (Ser/Leu/ 

Asn/His ) (Gln/Val/His/Phe ) (Gly/Thr/Val ) 
(Lys/Ser/Tyr) ( Asn/Gln/Arg ) (Gly/Thr) 
35 (Asn/Ile/Glu) ( Gln/Ile/Ser/Arg ) Arg 

(Asn/Gly) (Ile/Glu) (SEQ ID:23) 

7. Thr (Gly/Ala) (Ala/Ile) Thr ( Ala/Val/Leu ) 
(Ala/Ser/Leu ) (Asp/Pro/Glu ) ( Ala/Thr/Leu ) 
(Ala/Ser/Phe) (Gly/Ser/Val ) ( Asp/Leu/Val ) 
(Asp/Lys/Ile) (Val/Leu) (SEQ ID:24) 

8. Asn (Thr/Leu) Leu (Val/Ser) (Gly/Ala) Ala 



30 



40 



45 



50 



(Lys/Ile) (Ser/Ala/Lys ) Gly ( Ser/Asp/Ala ) 
Z Z (Ser/Lys) (SEQ ID: 25) 

As "non-functional region-related amino acid sequence patterns": 



55 
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9. Asp Pro ( Ala/Leu/Lys ) (Ile/Phe) Leu Ala Phe 

Val Lys (Leu/Glu) ( Ser/Leu/Glu ) (Gly/Leu) 
Lys Phe (SEQ ID: 26) 

10. Glu Z lie Z Z (Lys/Phe/Ser ) Z (Pro/Leu) Leu 

(Asp/Leu/Ile) (Gly/Asp) (Ser/Ile) 
(Gly/Lys/Phe) (SEQ ID: 27) 

11. Phe (Gly/Leu) (Asp/Ser) (Ile/Asp) (Gly/Val) 

(Ser/Glu) (Val/Asp) Val Ala Z (Gly/Leu) 
(Val/Ser/Thr) (Gly/Glu) Asp (SEQ ID: 28) 

12. Lys Leu ( Asp/Asn/Pro/Lys /Thr ) Z Z 

(Gly/Glu/Ser ) (Leu/Ile) (Gly/Thr) 

( Asp/Lys/Thr ) ( Val/Leu/Pro/Asn ) (Gly/Pro ) 

Leu (Ser/Pro/Asp/Glu/Ile) Leu (SEQ' ID: 29) 

13. Leu (Lys/Phe) (Asp/Gin) (Ile/Tyr) 

(Gly/Asp/Tyr ) (Arg/Asn) Glu (Glu/Leu/Tyr ) 
( Asp /.Lys /Glu/ 1 le ) ( Asp/Lys /Glu/ lie ) 
(Asp/Ala/Val/Phe) (Asp/Lys ) (Val/Leu) 
(Pro/Lys/Val) (SEQ ID: 30) 

14. Pro (Asp/Ala/Ile) Lys Gly Z Thr (Ala/Asn) Ala 

(Val /Gly) (Gly/Ser/Lys) Lys Val (Gly/ Ala/ 
Lys/Ile) (Leu/Ala) (SEQ ID:31) 

15. Tvr Val Z Leu ( Asn/Val/Leu ) ( Ser/Asn/Val ) Z 

Asn Leu Thr Asp (SEQ ID: 32) 

In the above related amino acid sequence patterns, the underlined amino acids are the reference amino 
acids X t . The percentage of amino acids other than these reference amino acids was calculated, and amino 
acids having a high degree of frequency of contributing to each related amino acid sequence pattern, were ext- 
racted and simplified. 

Thus, the following functional region-related amino acid sequence patterns were obtained: 
1-1. Cvs (Gly/Ser) Ala Z Gly Z (Gly/Ala) 
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(SEQ ID:33) 

2- 1- His Z 2 Ser (Gly/Ser) Z Gly Z Gly Z Gly 

(SEQ ID:34) 

3- 1- Arq Ser Z Z Z Ala Z Ser Z Z ( Gly/Ala/Ser ) 

(SEQ ID:35) 

4- 1. Ala Z Ser Z Z Z Z Gly Z Z Gly Z Ala Z 

(SEQ ID:36) 

5- 1. Gly (Gly/Ser) Z Z Z Ser Gly Ala Ser Gly 

(Ala/Gly) Gly (Gly/Ser) (SEQ ID: 37) 

6- 1. Ser (Gly/Ser) Ala Ser Z Gly Ser Z Gly Z Ser Z 

Gly (SEQ ID:38) 

7- 1. Thr (Gly/Ala) Ala Z Ala (Ala/Ser) Z Ala 

(Ala/Ser) (Gly/Ser) (SEQ ID: 39) 

8- 1. Asn Z Z Ser (Gly/Ala) Ala Z (Ser/Ala) Gly 

(Ser/Ala) Z Z Ser (SEQ ID:40) 

and as non-functional region-related amino acid sequence patterns: 

9- 1. Asp Pro (Leu/Lys) Z Leu Z Z Val Lys Leu Leu 

Leu (Gly/Lys) (SEQ ID: 41) 

10- 1. Glu Z Z Z Z Lys Z Leu Leu (Asp/Leu) (Gly /Asp) 

Z (Gly/Lys) (SEQ ID: 42) 

11- 1. Phe (Gly/Leu) Asp Asp (Gly /Val) Z (Val/Asp) 

Val Z Z (Gly/Leu) Val Gly Asp (SEQ ID: 43) 

12- 1. Lys Leu (Asp/Lys) Z Z Gly Leu Gly (Asp/Lys) 

(Val/Leu) Gly Leu Asp Leu (SEQ ID: 44) 

13- 1. Leu Lys Asp Z (Gly/Asp) Z Z Z Leu (Asp/Lys) 

(Asp/Val) (Asp/Lys) (Val/Leu) (Lys /Val) 
(SEQ ID:45) 

14- 1. Pro Asp Lys Gly Z Z Z Z (Val/Gly) (Gly/Lys) 

Lys Val (Gly/Lys) Leu (SEQ ID: 46) 

15- 1. Tvr Val Z Leu (Val/Leu) Val Z Z Leu Z Asp 

(SEQ ID:47) 

In the above related amino acid sequence patterns, the underlined amino acids are reference amino acids 

Xi. 

Amino acids belonging to X, and X 2 respectively were extracted from these related amino acid sequence 
patterns to obtain the following results. 

Namely, ther wer obtain d as supposed functional region-related patt ms: 
Xi ... Cys, His, Arg, Ala, Gly, Ser, Thr, Asn 
X 2 ... Gly, Ala, S r 
as supposed non-functional region-related patterns: 
X, ... Asp, Glu, Phe, Lys, Leu, Pro, Tyr 
X 2 ... Leu, Lys, Gly, Asp, Val 
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Then, th following results were obtained by investigation from the viewpoints described in the relation of 
sense and antisense. 

Namely, for example, when an attention is given to Ala (alanine) as an X 2 in the supposed functional re- 
gion-related patterns, one of the codons (although, usually, a codon means one having as a unit sequence con- 
5 sisting of three of 4 kinds of bases composing mRNA, this is conveniently applied to DNA in this specification) 
is "GCT. The antisense against this codon is CGA, and when this 3-base sequence is read in the direction of 
5' 3\ AGC is obtained whereby Ser (serine) is coded when read in the direction of 3' 5\ "CGA" is Arg 
(arginine). 

On the other hand, when GOT, a codon of Ala (alanine), is read in the reverse direction (3' -» 5'), a TCG" 

10 is Ser (serine). All these amino acids (Ser and Arg) exist in amino acids of the group of in the patterns relating 
to the supposed functional region-related amino acid sequence patterns. On the other hand, when Lys (lysine) 
is taken as an example of X 2 in the supposed non-functional region-related patterns, similar relations occur, 
too. Namely, one of the codons of this amino acid is AAG. The antisense against this codon is TTC, and when 
this 3-base sequence is read in the direction of 5' -» 3', "CTT" is Leu (leucine). When read in the direction of 

15 y 5', TTC" is Phe (phenylalanine). 

On the other hand, when AAG, a codon of Lys (lysine), is read in the reverse direction (3' -> 5'), "GAA" is 
Glu (glutamic acid). All these amino acids (Leu, Phe and Glu) exist in amino acids of the groups of X t in the 
patterns relating to the supposed non-functional region-related amino acid sequence patterns. Therefore, it can 
be surmised that the amino acids of X-! and X 2 are familiar with each other based on sense and antisense. Con- 

20 sequently, by utilizing this relation, it is possible to efficiently surmise an active site in a test polypeptide. 

A trial to find specific active sites of a test polypeptide utilizing such relations was made previously in 1981 
by J. Biro (Literature 11). He proposes a hypothesis that informational complementarity exists in intermolecular 
or intramolecular specific interaction regions of proteins. Intramolecular (in the same protein) active sites are 
explained exemplifying 9 kinds of polypeptides each composed of 115 or less amino acids. 

25 However, it is concluded that in the same molecule, regions having informational complementarity extend 

all over the molecule, and the predominant of active sites in the said information is undistinguished from other 
regions. Therefore, it is difficult to specify the active sites from amino acid sequence of their precursors. 

Thus, the present inventors at first suppose an amino acid sequence pattern represented by the following 
formula (II): 

30 X 1 '(Z) m X 2 ' (II) 

This amino acid sequence pattern is referred to as a supposed complementary amino acid sequence pat- 
tern for convenience sake. In the above formula (II), X/ and X 2 ' are specific amino acids selected from 20 amino 
acids naturally occurring in proteins, and X^ and X 2 ' are complementary with each other and related as above 
through their sense and antisense. In this sense, the above restriction requisite is essentially different from sup- 

35 posed amino acid sequences denoted by X^ZJ^XJi.e., formula(l)]. are the same or different uns pec ific amino 
acids, and m is 0 or an integer of 1 to about 30. However, since X/ and X 2 are related as above, it is sufficient 
to use m of 0 or 1, i.e., XV X 2 ' and X/ Z X 2 '. Thus, 160 kinds of supposed complementary amino acid sequence 
patterns were prepared for the formulae X/ X 2 ' and X/ Z X 2 ' (X/ and X' 2 are selected from 20 kinds of amino 
acids). 

40 The present inventors further intensely investigated related and unrelated amino acid sequence patterns. 

Thus, 102 kinds of amino acid sequence patterns were prepared based on the repetitive sequences such as 
ab (Z) m ab and ab (Z) m ba (wherein a, b and Z are each any of 20 amino acids, and m is an integer of 0 to 15) 
and a substitution frequency constant disclosed in a literature (Literature 9), and the doublet sequences com- 
posed of the same or analogous amino acids (for example, Leu Leu, Leu lie, His His, Trp Trp, Phe Trp, Pro 

45 Pro, Lys Lys, Arg Lys, etc.). 

Then, these supposed complementary amino acid sequence patterns were applied to the amino acid sequ- 
ences of the reference polypeptides, a multivariate analysis was carried out to induce a statistically stable dis- 
criminant function capable of distinguishing an active site or region from other. The thus obtained discriminant 
function can be applied to the amino acid sequence of the test polypeptide to surmise an active site or region 

so in the amino acid sequence. 

As the first specific example of the invention, the above 262 kinds of amino acid sequence patterns were 
applied to the amino acid sequences of 48 kinds of reference polypeptides denoted in Table 7 (the DNA sequ- 
ences were quoted the description of BION/genbank database), and thereby a discriminant function was 
induced for surmising an active site or r gion (SAS, DISCRIM procedur , version 5 was used as an analytical 

55 method). 
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Table 7 



Protein name Residue Active 

number site 



20 



30 



1 


Coagulation factor IX 


461 


267, 


315, 


2 


Cathepsin G 


255 


62, 


106, 


3 


Acetylcholinesterase 


649 


276 




4 


Cholines te rase 


602 


226 




5 


Cutinase 


228 


99, 


140, 


6 


Esterase-6 


548 


209 




7 


Aspartate protease 


380 


84, 


269 


8 


Rhizopus pepsin 


352 


60 , 


245 


9 


Pepsinogen A 


385 


'91, 


274 


10 


Renin 


401 


101 , 


286 


11 


Collagenase 


469 


219 




12 


Enkephalinase 


749 


584 , 


637 


13 


Cell surface protease 


601 


255 




14 


Storomelysin 


477 


219 




15 


Thiol protease aleurain 


362 


168 , 


308 


16 


Calpain I larger (catalytic) 
subunit 


714 


115 , 


. 272 


17 


Cystein proteinase 1 


343 


142, 


, 286 


18 


Papain 


345 


158, 


, 292 


19 


Endoglucanase EG1 


256 


149 




20 


Lysosome a-glucosidase 


951 


517 




21 


Ricin 


565 


201 




22 


Sigma- Like toxin I 


315 


189 




23 


Carboxypeptidase Y 


532 


257 


, 508 


24 


Penicillin-linked protein 6 


400 


66 
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10 



15 



20 



25 



30 



35 



45 



50 



55 



381 84 
513 373 



25 KEX 1 carboxypeptidase 729 198, 470 

26 Micrococcus nuclease 207 93, 101, 145 

27 Ribonuclease RJ3 238 52, 125 

28 Cephalosporinase 

29 ATP synthetase a chain 

30 Fructose-1 , 6-biphosphatase 338 275 

31 Alkaline phosphatase 471 124 

32 Phosphate biphosphate aldolase 

33 Arginosuccinate lyase 

34 <* enolase 434 190, 412 

35 Ribulose biphosphate carboxylase 487 204 

36 Chloramphenicol acetyltransf erase 213 189 

37 Galactose-l-phosphate uridyl 365 179, 181 
transferase 



364 364 
464 51 



38 ADP-glucose synthetase 



431 39, 195 



39 Creatine kinase 381 283 

40 Cytochrome C peroxidase 362 116, 120, 243 

41 Malic acid dehydrogenase 312 150, 177 

42 Aldehyde dehydrogenase 497 298 

43 Glutamate dehydrogenase 558 183 
40 44 DNA gyrase subunit A 821 123 

259 10, 61, 187 

765 723 



45 Biphosphoglycerate mutase 

46 DNA topoisomerase 

47 Subtilisin BPN ' 382 139, 171, 328 



48 Urokinase 



431 224, 275, 376 



The position of * was quoted from BION/swiss-prot 
database . 

As a result, a discriminant function having a discrimination percentage of 89.2% (a priori probability of Near 
was 10.9%) was induced using 113 kinds of patterns out of 262 kinds of all the patterns. Further, the present 
inventors also conveniently induced discriminant functions having discrimination percentage of 88.6%, 85.7% 
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10 



and 73.1% (a priori probabilities of Near were 20%, 30% and 50%, respectively) without changing the use of 
the patterns. Amino acid patterns preferentially used in these discriminant functions are shown below. 
Namely, ther can be enumerated 

1. Gly 2 Ser 2. Gly Thr 3. Val Z His 
4. Ala Z Cys 5. Ala Ser 6. Ser 2 Arg 7. Leu Gin 
8. Val Tyr 9- Thr Z Typ 10. Thr Cys 11. Leu Z Asp 
12. Gly Z Thr 13. Leu Z Gin 14. Glu Phe 15. Glu Z Phe 
16. Arg Z Pro 17. Val Gin 18. Met Z His 19. z x 2 2 
> 1.8 20. Ser Ser f etc. 

15 In the above patterns, Z may be any of the unspecified 20 amino acids. Further, and Z 2 are amino acid 

sequences sequenced in tandem in the amino acid sequences, and means. a combination such that the total 
of their substitution frequency values is 1.8 or more. The substitution frequency values was based on Literature 
9. It is apparent therein is that among preferential amino acid patterns between 17 and 48 deferential amino 
acids used, the large majority of these preferential patterns are composed of the formula (II). Particularly, among 

20 them, patterns of Gly Z Ser, Gly Thr, Val Z His, Val Tyr, Leu Z Asp, Leu Z Gin, Val Gin and Ser Ser exhibit a 
high preferential order in any of the discriminant functions. 

As the second specific example of the invention, the above 262 amino acid sequence patterns were applied 
to the amino acid sequences derived from the antisenses of 48 reference polypeptides to induce a discriminant 
function for surmising an active site or region {the analytical method was the same as above). In this case, since 

25 such proteins do not actually occur in nature, analysis for discriminating their active sites is carried out under 
the conditions of that the amino acid on the antisense conres ponding to the amino acid at the position identified 
on the sense corresponds to the amino acid at the supposed active site. 

Under the supposition, a discriminant function was deduced in the same way as above. As a result, a dis- 
crimination function was induced exhibiting a discrimination percentage of 89% (a priori probability of Near was 

30 10.9%) using 95 patterns out of 262 kinds of patterns. Further, the present inventors also induced discriminant 
functions exhibiting discrimination percentage of 88.3%, 86.4% and 68.3% (a priori probabilities of Near were 
20%, 30% and 50%, respectively) without changing the use of the patterns. 

It is possible to apply the discriminant functions obtained by the first and second methods of the invention, 
individually and/or in combination, to every physiologically active polypeptide (or polynucleotide) whose amino 

35 acid sequence (or gene sequence) is known. 

As exemplified in the following examples, according to the method of the invention, it is possible to efficiently 
surmise functional sites or regions of physiologically active polypeptides (or polynucleotides) whose amino acid 
sequences (or nucleic acid sequences) are known but whose active sites or regions are unknown, and thus 
the method is very useful for modification, improvement or the like of polypeptides (or polynucleotides). 

40 The effectiveness of this invention is revealed below by examples. 

Example 1 

In order to verify the effectiveness of the method of the invention established as above, the method of the 
45 invention was applied to 14 enzymes whose. active sites had been known (shown in Table 8, but they are the 
same as in Table 4 to Table 6), and the surmised results were compared with the actual active sites. The results 
are shown in Figures 1 to 14. In this connection, the active sites of these enzymes were determined based on 
homologous sequences to analogous enzymes (Table 1). 

50 
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Table 8 



5 




Protein name 


Residue 
number 


Ac tion 
site 




1 


Ribonuclease (bovine ) 


150 


38, 67, 145 


10 


2 


Trypsinogen ( canine ) 


24 6 


63, 107, 200 




3 


Carboxypeptidase (rat) 


419 


273, 358, 380 




4 


Carbonic anhydrase I (human) 


261 


107, 200 


15 


6 


Superoxide dismutase (human) 


154 


64, 144 




8 


Triosephosphate isomerase (chicken) 248 


165 


20 


9 


Alcohol dehydrogenase (human) 


375 


49, 52 




10 


Glutathione reductase 
(Escherichia coli) 


450 


42, 47, 439 


25 


12 


Catalase (rat) 


527 


75, 148 




13 


Lysozyme C (bovine) 


147 


5 3, 71 




14 


Actinidine (kiwi fruit) 


220 


25. 162 


30 


15 


Glyceraldehyde 3 -phosphate 
dehydrogenase (human) 


335 


152, 179 




16 


Phosphoiipase A2 (bovine) 


145 


70, 121 


35 


17 


Thermolysin (bacterium) 


552 


379, 467 



Example 2 

40 

The functional regions (namely, bidning sites, etc.) of proteins other than enzymes such as cytokines, hor- 
mones and receptors were verified using the above discriminant functions. The results are shown in Table 9 
to Table 13. 

45 



50 



22 



EP 0 494 502 A1 



10 



15 



20 



25 



o 

■w 

DO 
O) 
U 



o 



30 



CO 



35 



40 





a) 












o 


a> 




*H 


4J 




4J 


-H 




O 


CO 






















£ 




CD 


O'" 




c 


•H * 


c 


CI 


4-J 


o 


Ui 


cd 






J-l 


00 




4J 


<U 


CD 


a> 


u 




c 






a> 






cx 





2; 
o 



a> 
.a 

6 

2 



O 



o S cj 
as a 



45 



50 



O 
u 
cx, 



55 



cn 










in 






cn 


cn 




CO 


rs. 


cn 




cn 


cm 










to 




o 




vo 




cn 


o> 


cm 


to 


cm 


t— < 










<* 




VO 


CO 


cn 




cn 








CM 










* 


iO 




o 


to 


CM 








r~i 




T— ^ 










CO 


cn 




GO 


Csl 


m 




cn 




- 
















CN 








■ 




in 


r- 


to 


CO 


CO 




CO 




cn 


cn 




m 




- 


• 




r- 


r- 


CO 




co 


co 


cm 




OJ 


*~i 














CO 
















CM 




























to 






















to 


VO 






CM 


cn 


















i— I 








ci 




O 










cn 


CO 


O 


CO 




C 


VO 


i 


i 


cu 


1 


cn 




■U 


cn 


to 


cn 


o 


in 


cn 












CU) 








c 





















c 

cu 

00 

o 
c 



c 

03 



GO 

c 





vo 














s 






vo 






CO 


CXI 






cn 




csl 


vo 




- 


tH 


- 






-X 


cjv 


CO 


to 




CO 




GO 








- 


t— < 


CO 




- 


cn 




Csl 




to 




IO 


to 






to 


o 








- 


i— 1 








cn 




VO 




cn 


r- 


o 


*f 




cn 


iO 








fH 






CvJ 


to 




O 




m 


O 


to 


to 


cm 


CO 


CO 


to 


<r 








to 


CO 








CO 


cn 


O 






cn 


r- 


m 










cn 




«— 1 


r— 


CS| 






0 


CO 




CO 




CO 


Csl 


to 


CM 










csi 




CO 


CM 


O 






to 


O 


rs. 






to 


CM 


tn 


vO 














0 


CO 


^3* 


O 




m 


m 


to 


cn 




in 


rH 





to 
cn 



CO 



co 
to 



to 
o 



r- 
cn 



vo 
to 

CO 



o 

CM 
i-H 
CX 
GO 

CU 

cx 
o 

1 — I 

CJ 

> 

c; 



o 



, CM 
CO 



o o 

o o 

CM CO 

CM CO 

cn cn 

«H to 



c 

<r to 
jj cu co 

cd CX to 4J a; 

u o 1 q *J 

-U -U H O t! 

cu cx to d 

C a> —h 

^ cd 



c 

o 



co 

CO 



cx 
d 
u 



> 
o 



o 

Cm 



to 60 
P- *H CD 



23 



EP 0 494 502 A1 



10 



15 



20 



25 



C 

o 

CO 
03 
U 

-a 



35 
O 



2= 

o 

CO 



2= 
o 



30 



J3 
6- 



35. 



40 



45 



o 

•H 

o 



a 

a> o 

C *H C 

aj -LJ o 

U (d -h • 

U GO 

g AJ QJ 

QJ CJ U 



V-i o 
XJ -H H 

S o S o 



50 



55 



cO <r 
O <f 
rH 

- m 
in o\ 



CO 

VO 

o 

CM 

- o 



o 
o 



00 
CM 







cm 










vo 






VO 


CM 






OA 


rH 


CM 


o 




co 


1 


1 


vo 




i 


CVJ 


CO 






VO 


to 


O 


CO 


VO 


CO 


T~H 


eg 


to 


VO 








VO 


<r 








m 


VO 





o 

VO 



cr\ m o i— i 

m co n m 

N n <n <r 
tiii 

io r- rH co 

rO vO C <0 

N csl cn <r 



c 

c « 

rH I 

I — < a 

o 

u o 

xj a> 

<D O 

U QJ 

< 1-1 



c 

•H 

3 





to 




• 

vO 




co 


CO 




CO 




CO 


VO 




CM 




CO 






rH 






CO 


cH 








VO 


rH 


0\ 




<r 


m 


CT\ 


VO 




CM 






rH 








CM 


rH 




o 


o 


VO 








tn 


CO 


CM 




,_( 






ON 






tn 


CO 


o 




CM 


r-- 


m 


rH 




CM 










,—1 








CO 






cO 


CN) 




(t"l 


, — ( 


o\ 


o 


CO 


eg 




vo 


rH 


rH 














VO 




cM~ 






CO 




CO 




<** 


- 


CM 


CNJ 


























rj 








CO 


_T 


_T 


vo 
















oa 


o 


rH 






VO 


o> 






•-4 






r-* 


VO 




m 


OA 


CO 


<r 


m 


VO 


o\ 


o 


CO 


rH 




CO 


rH 


rH 












rO 


CO 










o 








vo 


f— 1 








r— 


VO 








o 


vo 








m 


CO 


















CO 


CM 








a\ 


VO 










r— 










<r 




O 






o 




CO 










rH 




1 


1 

*n 




1 

O 










rH 





m 

O DO 

rH C 



CO 
CO 



CO 

CM 



QJ 



tn *o 

co C -u 
O -»H 

ri « 
I 

cm 2: 
o o 



CTv 



CO 
CO 



u 
o 
-u 
cu 
<u 
a 

0) 
V4 



24 



EP 0 494 502 A1 



10 



15 



20 



25 



30 



35 



40 



45 



-a 
a> 
v> 
•h 

e 

w 
3 
CO 



25 
O 



5= 
o 



2 
o 











a] 






a 






Q 


CD 




*H 


4-1 




jj 


*H 




(J 


w 




G 


















c 






o 




C 


•H 


C 


aj 


4-) 


o 


U 


ad 


*H 


.Q 


iH 


00 


6 


-U 


GO 


<L> 


a> 


U 




C 






a> 

o _ 





Wi o 
<u c -o 

>W -W 

£ o e o 
2: 







CO 










VO 










CO 










<r 








to 


o> 










*H 








CsJ 




«H 








CD 


to 




as 


a\ 


-J- 


vO 


O 


•h 


m 


i-f 




CO 


tH 


rH 












CM 


tH 




OJ 


eg 


O 


VO 


CO 




C\ 


rH 




r- 








CM 




.H 


rH 


r-- 




vo 




CS4 


m 




.H 



iH 
CM 



i a 
m 

cvj pa 



C 

o -h 
«H aj 

I u 
o 

o\ < 



CO 
CO 







O 






rH 








CM 




O 


<r 








rH 




CO 


rH 




















o> 














j_j 














•H 














tn 


<r 


-H 




CO 


T» 


o 




tH 


d 


rH 


CO 


•H 


vo 


cm 


tH 


o 


CTv 


VO 


o 


i 


c 


1 


-u 


1 


t 


rH 


vn 




cn 


•H 


r- 


r» 


-w 






CO 


u 


r— 


cr» 


6 




c 




r-H 




LO 


cd 




-H 




cd 






I 




jQ 




O 























vn 
CM 



m c\ 
cm vo 



oo ^ 
CO *h o 

r-t T3 tH 

i C oo 

CNj -r-C OJ 

N ^ 



CO 



o 
6 



50 



C 

-H 

<U 

u 
O 
u 
a, 



3 

CO 



C 

o 



u 

r— I 

o 



55 



u 

I 

c 
ea 



25 



EP 0 494 502 A1 



10 



15 



20 



25 



30 



35 



40 



45 



50 



a 
o 



-a 

<D 



3 
CO 



o 
to 



o 



23 
O 



CO 



aj o 
cd aJ 

a) a> 
S C 
at 



U O 

<D C U 

^3 «W -H -H 

§o £ u 

2: 



c 

-*H 

o 
u 



55 



















GO 




















<r 






t-H 






<*• 






CO 


u"> 












rH 






oo 








CM 






CM 






CM 


-d- 




m 
















^H 




m 


CO 






CO 






VO 


to 




cm 


rH 






oo 














CNI 






rH 






CN 


to 




















vO 






cn 






O 






Ov 


<r 




t-H 








CM 






CO 








rH 




m 


rH 






rH 


CO 




co 






o 










ON 


CNI 


-T 


cn 




cm 


CO 






rH 


CO 


to 


rH 


VO 






m 




rH 


00 




rH 






cn 


m 




*n 






o 






rH 


vo 


co 


m 




cr» 


rH 


m 


m 




co 


CO 


t— 1 


H 


eg 


CO 


rH 


CO 


CO 



VO 



VO 




m 






o 




CO 




CO 






CO 








rH 






CN 


ON 


rH 












<T 


rH 




CO 






Ov 


to 


CM 




rH 






CO 








rH 






i— 1 


r*- 


VO 












VO 


•J" 




CO 






o 


-J 


rH 




m 






CO 






in 




CM 




■c • 




rH 


CO 


CO 


m 


CO 


O 


cn 


CO 


rH 


rH 


CM 


co 


rH 


CO 












CO 














m 














m 




0Q 










CO 




VO 




VO 






VO 




CO 




CO 














rH 










rH 










o 




m 




<r 






cn 




rH 


1 


m 




i 







o 
o 

s3- 



o 

CO 



VO 

VO 



«d 



< 



CO 



o 



O 



in 
ov 



o 

AJ 

a. 
a> 
a 
a) 
u 

C 

<U 
00 

o 
u 



in 

CO 



























— 1 












> 












al 








a\ 
















C 




VO 




cn 




cn 












o 


CL> 


i 




1 




cn 












-*H 




VO 




O 




rH 












*J 




m 




CO 


f*-. 




in 


vO 


CN 


CO 




U 


« 








rH 




m 


VO 


VO 






c 




r- 


r>- 


m 


CM 


:s 


rH 


rH 


CO 


in 


c-» 






64 


<r 


<r 


1 




i 


i 


1 


i 


r-- 






t 


l 


i 


o 


CO 


rH 


VO 


CO 


co 








CM 


rH 


CO 


cn 


cn 


m 


m 


uo 


CO 


co 






CM 


CM 


CO 


rH 


rH 


rH 


rH 


CO 


m 


m 



u o u> 

c c 

O O 3 

■H TJ XJ 

U ed 3 

O C « 

,e o i 



o 

CM 



26 



EP 0 494 502 A1 



10 



15 



20 



25 



30 



35 



40 



45 





i-H 








td 








C 








o 


0) 






-H 








+J 








O 


CO 






c 
























c 








o 






c; 


~H 


C 




<a 




O 




u 


a 


•H 






u 


CO 




a 




« 






o 








e 








CJ 








a. 






u 




o 




OJ 




c 


-o 


-3 






— t 




O 


a 










cs 


2: 









50 





<r 






CM 




rH 


t-H 










<r 


<Oj 


O 




P*- 


rH 


«M 


r-H 




CM 


CM 


cm 





m 


CM 










co 


o 










cm 


co 














rH 








en 


uo 






<r 




LO 


o 


<n 


in 


o 




t-H 


rH 




r«. 




CO 


VO 


m 


co 


co 


CO 


CM 


CO 


CO 


C4 


CO 


CM 













. CO 














-J 






m 








o 










*o 




O 




t 


o 








r-H 










CM 


i 


rH 




































































C 




MH 




















J 




■H 




•H 














4-1 






1 






















S3 




< 


<3cl 




o« 




o 










< 




±J 


















OI 










cn 






<r 




o 


CO 




CO 


C3 






o 




a 




r— 


CM 


vO 


-o 


LO 


o 


cn 


-H 


Ut 






VO 


4J 


rH 


rH 


CM 


CM 


c 


rH 


•H 


r-t 


CJ 


CD 


CM 


rH 


rH 




CO 


1 


i 


i 


OJ 


1 


« 


1 


3 


Du 


CO 


1 


rH 




1 




r*. 




I 


CO 




m 


a> 


CX 


t 


CO 


1 


C 


CO 


CO 


rH 


<N 




<r 


ja 


VO 


rH 


•«H 


co 


CM 


ON 




*o 


rH 


CM 


CM 




rH 




rH 




M 




rH 


CO 





o 
co 















OJ 






•H 




O 






4J 




U-i 






01 


-H 


1 








a. 


o 






O 










o 




c 






4J 


32 


<a 


1 






f- 


e 




Out 


£ 


U 






Z 


o 


< 




8 


< 


VI 


CM 


CO 




in 


VO 


CM 


CM 


CM 


CM 


CM 



55 



27 



EP 0 494 502 A1 



Example 3 

As a typical example in substances other than proteins, a surmised r suit is d not d on the catalytically 
active site of a tobacco ringspot virus (TRSV), one of ribozymes. It is known that TRSV is an RNA composed 
5 of 359 nucleotides and its catalytically active site exists in the 50 nucleotides between the thymine nucleotide 
at the 175th position and the thymine nucleotide at the 224th position. The amino acid sequence obtained by 
translation of this nucleotide sequence is as follows. 

1Q Thr Gly Cys Ala Phe Arg Ser Asp Glu Ser Val Arg Thr 

Lys Gin Asp Cys Gin Val Ala Glu Ser His His Val Asn 
. * Thr Val Leu Arg Ser Val Gly Val Cys Tyr Leu Val 
Gly Gly Gly Asp Cys Ser Leu Arg Val Gly Ala Ala Val 

15 - Leu Val Lys Ala Tyr Gin Val lie Tyr His Asn Val 

Cys Phe Ser Gly • Leu Leu Cys Leu Leu Cys His Trp 
Phe Pro Asp Leu Ala Leu Ala Ala Thr Gly Tyr Ser His 

20 Ser Thr Trp Lys Phe Glu Arg Pro Arg Leu Tyr Thr Met 

Arg Gly Glu Ser Lys Leu Phe • Pro Asp Thr Leu 
(SEQ ID:48) 

25 The result obtained by applying the above discriminant function to this sequence is shown in Fig. 15. 

As mentioned above, according to the method of the invention, it is possible to effectively surmise the active 
site of a physiologically active polypeptide whose amino acid sequence is known but whose active site is not 
known, and thus the method is very useful tool for uderstanding the functional mechanism of polypeptides, 
modification and improvement of polypeptides, etc. 

30 
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SEQUENCE LISTING 

SEQ ID NO: 1 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL : yes 

(ix) FERTURE: 

(D) OTHER INFORMATION: 

Xaa2: Gly or Ser; Xaa7 : Gly or Ala; 

Xaa4 and Xaa6 : any amino acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1 

Cys Xaa Ala Xaa Gly Xaa Xaa 
1 5 

SEQ ID NO: 2 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 11 amino acids 

( B ) TYPE : amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETI C AL : ye s 
(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa2, Xaa3, Xaa6, Xaa8, Xaa9: any amino 
' acid; Xaa4 : Gly or Ser 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 

His Xaa Xaa Ser Xaa Xaa Gly Xaa Gly Xaa Gly 
1 5 10 

SEQ ID NO: 3 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: yes 

(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa3, Xaa4, Xaa5, Xaa7, Xaa 9 , XaalO: any 

amino acid; Xaall: Gly, Ala or Ser 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 

Arg Ser Xaa Xaa Xaa Ala Xaa Ser Xaa Xaa Xaa 
15 10 



SEQ ID NO: 4 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
45 (HI) HYPOTHETICAL: yes 

(ix) FERTURE 

(D) OTHER INFORMATION : 

All Xaa: any amino acid 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 

Ala Xaa Ser Xaa Xaa Xaa Xaa Gly Xaa Xaa Gly Xaa 

15 10 
Ala Xaa 

SEQ ID NO: 5 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL : yes 
(ix) FERTURE . 

(D) OTHER INFORMATION: 

Xaa3, Xaa4, Xaa5: any amino acid; Xaa: 
Gly or Ser; Xaall: Ala or Gly; Xaal3: 
Gly or Ser 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5 

Gly Xaa Xaa Xaa Xaa Ser Gly Ala Ser Gly Xaa Gly 

15 10 
Xaa 

SEQ ID NO: 6 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
( iii ) HYPOTHETICAL : yes 
(ix) FERTURE 
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(D) OTHER INFORMATION: 

Xaa5 , Xaa8, XaalO, Xaal2: any amino acid; 

Xaa2: Gly or Ser 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6 

Ser Xaa Ala Ser Xaa Gly Ser Xaa Gly Xaa Ser Xaa 

1 5 10 

Gly 

SEQ ID NO: 7 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL : yes 

(ix) FERTURE 

(D) OTHER INFORMATION : 

Xaa4, Xaa7: any amino acid; Xaa2 : Gly 

or Ala; Xaa6: Ala or Ser; XaalO: Ala 

or Ser; XaalO: Gly or Ser 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7 

Thr Xaa Ala Xaa Ala Xaa Xaa Ala Xaa Xaa 
1.5 10 

SEQ ID NO: 8 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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( iii ) HYPOTHETICAL : yes 

( ix ) FERTURE 

(D) OTHER INFORMATION: 

Xaa2, Xaa3, Xaa7 , Xaall, Xaal2: any amino 

acid; Xaa5 : Gly or Ala; Xaa8: Ser or 

Ala; XaalO: Ser or Ala 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 

Asn Xaa Xaa Ser Xaa Ala Xaa Xaa Gly Xaa Xaa Xaa 

15 10 
Ser 

SEQ ID NO: 9 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: yes 

(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa4, Xaa6, Xaa 7 : any amino acid; Xaa3 : 

Leu or Lys; Xaal3: Gly or Lys 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9 

Asp Pro Xaa Xaa Leu Xaa Xaa Val Lys Leu Leu Leu 

15 10 
Xaa 

SEQ ID NO: 10 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 
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10 



(B) TYPE: amino acid 

( D ) TOPOLOGY : linear. 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: yes 

(ix) FERTURE ' 

(D) OTHER INFORMATION: 

Xaa2, Xaa3, Xaa4 , Xaa5 , Xaa7 , Xaal2: any 

amino acid; XaalO: Asp or Leu; Xaail: 

Gly or Asp; Xaal3: Gly or Lys 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 

Glu Xaa Xaa Xaa Xaa Lys Xaa Leu Leu Xaa Xaa Xaa 

1 5 .10 

Xaa 
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SEQ ID NO: 11 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: yes 

(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa6, Xaa9, XaalO: any amino acid; Xaa2: 

Gly or Leu; Xaa5 : Gly or Val; Xaa7 : Val 

or Asp; Xaall: Gly or Leu 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 

Phe Xaa Asp Asp Xaa Xaa Xaa Val Xaa Xaa Xaa Val 
15 10 
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Gly Asp 
SEQ ID NO: 12 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL : yes 
(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa4, Xaa5 : any amino acid; Xaa3 : Asp 
or Lys; Xaa9: Asp or Lys; XaalO: Lai 
or Leu 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 

Lys Leu Xaa Xaa Xaa Gly Leu Gly Xaa Xaa Gly Leu 

15 10 
Asp Leu 

SEQ ID NO: 13 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: yes 
(ix) FERTURE 

(D) OTHER INFORMATION : 

Xaa4 r Xaa6, Xaa7, Xaa8 : any amino acid; 
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XaaS : Gly or Asp; XaalO: Asp or Lys ; 

Xaall: Asp or Val; Xaal2: Asp or Lys; 

Xaal3: Val or Leu; Xaal4: Lys or Val 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 

Leu Lys Asp Xaa Xaa Xaa Xaa Xaa Leu Xaa Xaa Xaa 

15 10 
Xaa Xaa 

SEQ ID NO: 14 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: yes 

( ix ) FERTURE 

(D) OTHER INFORMATION: 

XaaS, Xaa6, Xaa7 , Xaa8 : any amino acid; 

Xaa9 : Val or Gly; XaalO: Gly or Lys; 

Xaa 13: Gly or Lys 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 

Pro Asp Lys Gly Xaa Xaa Xaa Xaa Xaa Xaa Lys Val 

1 5 .10 

Xaa Leu 

SEQ ID NO: 15 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: yes 

(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa3 , Xaa7 , Xaa8 XaalO: any amino acid; 

Xaa5 : Vol or Leu 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 

Tyr Val Xaa Leu Xaa Val Xaa Xaa Leu Xaa Asp 
1 5 10 

SEQ ID NO: 16* 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL : yes 

(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa4 , Xaa6 : any amino acid; Xaa 2 : Gly 

Ser; Xaa7 : Gly or Ala 

(xi). SEQUENCE DESCRIPTION: SEQ ID NO: 16 

Cys Xaa Ala Xaa Gly Xaa Xaa 
1 5 

SEQ ID NO: 17 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

( iii ) HYPOTHETICAL : yes 

( ix ) FERTURE 

(D) OTHER INFORMATION: 

Xaa2, Xaa4 XaalO, Xaa 12: any amino acid; 

Xaal: Gly or Ala; Xaa6 : Gly or Ser; 

Xaa8: Gly or Ser; Xaal3: Gly or Ala 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 

Xaa Xaa Gly Xaa Ala Xaa Cys Xaa Ala Xaa Gly Xaa 

15 10 
Xaa 

SEQ ID NO: 18 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino' acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: yes 
(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa4 , Xaa6 : any amino acid; Xaa2: Gly or 
Ser; Xaa5 : Gly or Arg; Xaa7 : Gly or 
Ala 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 

Cys Xaa Ala Xaa Xaa Xaa Xaa 
1 5 

SEQ ID NO: 19 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: yes 

( ix ) FERTURE 

(D) OTHER INFORMATION: 

Xaa2, Xaa3, Xaa6, Xaa8, XaalO: any ainino 

acid; Xaa5: Gly or Ser 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 

His Xaa Xaa Ser Xaa Xaa Gly Xaa Gly Xaa Gly 
1 5 10 

SEQ ID NO: 20 

(i). SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: yes 
(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa3, Xaa4, Xaa7, Xaa9 , XaalO: any amino 
acid; Xaall: Gly, Ala or Ser; Xaal2: 
Ser or Val 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO; 20 
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Arg Ser Xaa Xaa Cys Ala Xaa Ser Xaa Xaa Xaa Xaa 
1 5 10 

SEQ ID NO: 21 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: . yes 

(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa3: any amino acid; Xaa2 : Ser,, Thr, 
Val or Cys; Xaa4 : Thr, Leu or Asn; 
Xaa5 : Thr or Asn; Xaa7 : Gly, Thr or 
Asn; Xaa8: Thr or Glu; Xaa9 : lie or 
Asn; XaalO: Gly or Arg; Xaall: Glu or 
Glum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 

Ala Thr Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa Xaa 

1.5 10 
Ala 

SEQ ID NO: 22 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: yes 
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( ix ) FERTURE 

(D) OTHER INFORMATION; 

Xaa2; Gly, Ser or Thr; Xaa 5 : Asn, His 
or Cys; Xaa6: Ser, Pro or Phe; Xaa7 : 
Gly, His or Cys; Xaa8 : lie or Asn; 
Xaa9 : Gly or Thr; XaalO: Ala, Val, Gly, 
Arg, His, Tyr; Xaal2: Gly or Ser 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 

Gly Xaa Gin Tyr Xaa Xaa Xaa Ala Xaa Xaa Xaa Gly 

15 10 
Xaa Val 

SEQ ID NO: 23 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: yes 

(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa2: Gly, Lys , Glu, Cys, Ser or Arg; 
Xaa4 : Ser, Leu, Asn or His; Xaa5 : Glu, 
Val, His or. Phe; Xaa6: Gly, Thr or Val; 
Xaa 7: Lys, Ser or Tyr; Xaa8 : Asn, Gin 
or Arg; Xaa9 : Gly or Thr; XaalO: Asn, 
lie or Glu; Xaall : Gin, lie, Ser or Arg; 
Xaal3: Asn or Gly; Xanl4: lie or Glu 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 

Ser Xaa Ala Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Arg 

15 10 
Xaa Xaa 

SEQ ID NO: 24 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: yes 
( ix ) FERTURE 

(D) OTHER INFORMATION: 

Xaa2: Gly or Ala; Xaa3: Ala or lie;. 
Xaa5: Ala, Val or Leu; Xaa6 : Ala, Ser 
or Leu; Xaa7 : Asp, Pro or Glu; Xaa8 : 
Ala, Thr or Leu; Xaa9: Ala, Ser or Phe? 
XaalO: Gly," Ser or Val; Xaall: Asp, Leu 
or Val; Xaal2: Asp, Lys or lie; Xaal3: 
Val or Leu 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 

Thr Xaa Xaa Thr Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

1.5 10 
Xaa 

SEQ ID NO: 25 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: yes 

(ix) FERTURE 

(D) OTHER INFORMATION : 

Xaa2: Thr or Leu; Xaa4 : Val or Ser; 

Xaa5: Gly or Ala; Xaa7 : Lys or lie; 

Xaa8: Ser, Ala or Lys; XaalO: Ser, Asp 

or Alu; Xaal3: Ser or Lys; Xaall, 

Xaal2: any amino acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25 

Asn Xaa Leu Xaa Xaa Ala Xaa Xaa Gly Xaa Xaa Xaa 
15 10 



Xaa 

SEQ ID NO: 26 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: yes 
(ix) FERTURE 

(D) OTHER INFORMATION : 
45 Xaa3 : Ala, Leu or Lys; Xaa4 : lie or 

Phe; XaalO: Leu or Glu; Xaall: Ser, 
Leu or Glu; Xaal2: Gly or Leu 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 
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Asp Pro Xaa Xaa Leu Ala Phe Val Lys Xaa Xaa Xaa 

15 10 
Lys Phe 

SEQ ID NO: 2 7 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 13 amino acids 

(B) TYPE : amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL : yes 
(ix) FERTURE 

(D) OTHER INFORMATION : 

Xaa2, Xaa4, Xaa5 , Xaa7 : any amino acid; 
Xaa6 : Lys , Phe or Ser ; Xaa8 : Pro or 
Leu; XaalO: Asp, Leu or lie; Xaall: 
Gly or ; Asp; Xaal2: Ser or lie; Xaal3: 

Gly, Ly s or phe 
(xi) SEQUENCE DESCRIPTION:" SEQ ID NO: 27 

Glu Xaa lie Xaa Xaa Xaa Xaa Xaa Leu Xaa Xaa Xaa 

1 5 10 

Xaa 

SEQ ID NO: 28 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
( iii ) HYPOTHETICAL : yes 
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(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa2: Gly or Leu; Xaa3: Asp or Ser; 
Xaa4 : lie or Asp; Xaa5 : Gly or Val; 
Xaa6: Ser or Glu; Xaa7 : Val or Asp; 
XaalO: any amino acid; Xaall: Gly or 
Leu; Xaal2: Val, Ser or Thr; Xaal3: 
Gly or Glu 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28 

20 Phe Xaa Xaa Xaa Xaa Xaa Xaa Val Ala Xaa Xaa Xaa 

1 5 10 

Xaa Asp 
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SEQ ID NO: 29 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: yes 

(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa3 : Asp, Asn, Pro, Lys or Thr; Xaa4 , 
Xaa5 : any amino acid; Xaa6 : Gly, Glu or 
Ser; Xaa7 : Leu or lie; Xaa8: Gly or 
Thr; Xaa 9 : Asp, Lys or Thr; XaalO: 
Val, Leu, Pro or Asn; Xaall: Gly or Pro; 
Xaal3: Ser, Pro, Asp, Glu or lie 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29 

Lys Leu Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Leu 

15 10 
Xaa Leu 

SEQ ID NO: 30 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: yes 

(ix) FERTURE 

(D) OTHER INFORMATION : 

Xaa2: Lys or Phe; Xaa3 : Asp or Gin; 
Xaa4 : lie or Tyr ; Xaa5 : Gly, Asp or 
Tyr; Xaa6 : Arg or Asn; Xaa8: Glu, Leu 
or Tyr; Xaa9 : Asp, Lys, Glu or lie; 
XaalO: Asp, Lys, Glu or lie; Xaall : 
Asp, Ala, Val or Phe; Xaal2: Asp or Lys; 
Xaal3: Val or Leu; Xaal4: Pro, Lys or 
Val 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30 

Leu Xaa Xaa Xaa Xaa Xaa Glu Xaa Xaa Xaa Xaa Xaa 

15 10 
Xaa Xaa 

SEQ ID NO: 31 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 14 amino acids 
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(B) TYPE: amino acid 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; peptide 

(iii) HYPOTHETICAL: yes 

(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa2 : . Asp, Ala or lie; Xaa5 : any amino 

acid; Xaa7 : Ala or Asn; Xaa9 : Val or 

Gly; XaalO: Gly, Ser or Lys ; Xaal3: 

Gly, Ala, Lys or lie; Xaal4: Leu or Ala 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 

Pro Xaa Lys Gly Xaa Thr Xaa Ala Xaa Xaa Lys Val 
1 5 10 

Xaa Xaa 



SEQ ID NO: 32 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
( iii ) HYPOTHETICAL : yes 
( ix) FERTURE 

(D) OTHER INFORMATION: 
45 Xaa3, Xaa7 : any amino acid; Xaa5 : , Asn, 

Val or Leu; Xaa6: Ser, Asn or Val 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 



Tyr Val Xaa Leu Xaa Xaa Xaa Asn Leu Thr Asp 
1 5 10 
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SEQ ID NO: 33 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: yes 

(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa4, Xaa6 : any amino acid; Xaa2: Gly 

or Ser; Xaa7: Giy or Ala 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 

Cys Xaa Ala Xaa Gly Xaa Xaa 
1 5 

SEQ ID NO: 34 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino. acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: yes 
(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa2, Xaa3, Xaa6 , Xaa8 , XaalO: any amino 
acid; Xaa5: Gly or Ser 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34 



49 



BP 0 494 502 A1 



His Xaa Xaa Ser Xaa Xaa Gly Xaa Gly Xaa Gly 
15 10 

SEQ ID NO: 35 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

( B ) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: yes 

( ix ) FERTURE 

(D) OTHER INFORMATION: 

Xaa3, Xaa4, Xaa5 , Xaa7, Xaa9, XaalO: any 

amino acid; Xaall: Gly, Ala or Ser 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 

Arq Ser Xaa Xaa Xaa Al Xaa Ser Xaa Xaa Xaa 
15 10 

SEQ ID NO: 36 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: yes 
(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa2, Xaa4, Xaa5, Xaa6, Xaa7, Xaa9 , XaalO, 
Xaal2, Xaal4: any amino acid 
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(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 36 

Ala Xaa Ser Xaa Xaa Xaa Xaa Gly Xaa Xaa Gly Xaa 

1 5 10 

Ala Xaa 

SEQ ID NO: 37 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: yes 
( ix ) FERTURE 

(D) OTHER INFORMATION: 

Xaa3, Xaa4, Xaa5 : any amino acid; Xaa2 : 
Gly or Ser; Xaall: Ala or Gly; Xaal3: 
Gly or Ser 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37 

Gly Xaa Xaa Xaa Xaa Ser Gly Ala Ser Gly Xaa Gly 

15 10 
Xaa 

SEQ ID NO: 38 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: yes 
(ix) FERTURE 
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(D) OTHER INFORMATION: 

Xaa2: Gly or Ser; Xaa5 , Xaa8, XaalO, 

Xaal2: any amino acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38 

Ser Xaa Ala Ser Xaa Gly Ser Xaa Gly Xaa Ser Xaa 

1 5 10 

Gly 

SEQ ID NO: 39 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL : yes 

(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa2: Gly or Ala; Xaa6: Ala or Ser; 

Xaa9 : Ala or Ser; XaalO: Gly or Ser; 

Xaa4, Xaa7 : any amino acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 

Thr Xaa Ala Xaa Ala Xaa Xaa Ala Xaa Xaa 
15 10 

SEQ ID NO: 40 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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(iii) 



HYPOTHETICAL: 



yes 



(ix) 



FERTURE 



(D) OTHER INFORMATION: 

Xaa2, Xaa3, Xaa7 , Xaall, Xaal2: any amino 



acid ; 



Xaa5 : 



Gly or Ala; Xaa8 : Ser or 



Ala; XaalO: 



Ser or Ala 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40 

Asn Xaa Xaa Ser Xaa Ala Xaa Xaa Gly Xaa Xaa Xaa 

1 5 10 

Ser 

w-z 

SEQ ID NO: 41 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: yes 
(ix) FERTURE 

(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: ,41 

Asp Pro Xaa Xaa Leu Xaa Xaa Val Lys Leu Leu Leu 

1 5 "10 

xaa 

SEQ ID NO: 4 2 

(i) SEQUENCE CHARACTERISTICS: 

(A) * LENGTH: 13 amino acids 



Xaa4 , Xaa7, Xaa8 : any amino acid; Xaa3: 



Leu or Lys; Xaal3: Gly or Lys 
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(B) TYPE: amino acid 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: yes 

(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa2, Xaa3, Xaa4, Xaa5 , Xaa7 , Xaal2: any 

amino acid; XaalO: Asp or Leu; Xaall: 

. Gly or Asp; Xaal3: Gly or Lys 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 2 

Glu Xaa Xaa Xaa Xaa Lys Xaa Leu Leu Xaa Xaa Xaa 

1 5 10 

Xaa 

SEQ ID NO: 43 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: yes 
(ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa6, Xaa9, XaalO: any amino acid; Xaa2: 
Gly or Leu; Xaa5 : Gly or Val; Xaa7 : 
Val or. Asp; Xaall: Gly or Leu 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43 
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Phe Xaa Asp Asp Xaa Xaa Xaa Val Xaa Xaa Xaa Val 
• 1 5 10 

Gly Asp 

SEQ ID NO: 44 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: yes 
( ix ) FERTURE 

(D) OTHER INFORMATION: 

Xaa4, Xaa5: any amino acid; Xaa3 : Asp 
or Lys ; Xaa9 : Asp or Lys ; XaalO: Val 
or Leu 

(xi) SEQUENCE DESCRIPTION: SEQ ID .NO: 44 

Lys Leu Xaa Xaa Xaa Gly Leu Gly Xaa Xaa Gly Leu 

I 5 10 

Asp Leu 

SEQ ID NO: 45 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: yes 
(ix) FERTURE 

(D) OTHER INFORMATION: 
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Xaa4, Xaa6 r Xaa7 , Xaa 8: any amino acid; 

Xaa5 : Gly or Asp: XaalO: Asp or Lys ; 

Xaall : Asp or Val; Xaal2: Asp or Lys ; 

Xaal3: Val or Leu; Xaal4: Lys or Val 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45 

Leu Lys Asp Xaa Xaa Xaa Xaa Xaa Leu Xaa Xaa Xaa 

1 5 10 

Xaa Xaa 



SEQ ID NO: 46 

(i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: . linear 

25 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL : yes 
30 (ix) FERTURE 

(D) OTHER INFORMATION: 

Xaa5, Xaa6, Xaa7, Xaa8: any amino acid; 
Xaa9 : Val or Gly; XaalO: Gly or Lys ; 
Xaal3: Gly or Lys 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46 



Pro Asp Lys Gly Xaa Xaa Xaa Xaa Xaa Xaa Lys Val 

15 10 
Xaa Leu 

SEQ ID NO: 47 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: yes 
( ix ) FERTURE 

(D) OTHER INFORMATION : 

Xaa3 , Xaa7, Xaa8, XaalO: any amino acid; 
Xaa5 : Val or Leu 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47 

Tyr Val Xaa Leu Xaa Val Xaa Xaa Leu Xaa Asp 

SEQ ID NO: 48 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 117 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48 

Thr Gly Cys Ala Phe Arg Ser Asp Glu Ser Val Arg 
15 10 

Thr Lys Gin Asp Cvs Gin Val Ala Glu Ser His His 
15 * 20 

Val Asn ^aa Xaa Thr Val Leu Arg Ser Val Gly Val 
25 30 35 

Cys Tyr Leu Val Gly Gly Gly Asp Cys Ser Leu Arg 
40 45 

Val Gly Ala Ala Val Xaa Leu Val Lys Ala Tyr Gin 
50 55 60 

Val lie Tyr His Asn Val Cys Phe Ser Gly Xaa Leu 

65 70 
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10 



Leu Cys Leu Leu Cys His Trp Phe Pro Asp Leu Ala 
75 80 

Leu Ala Ala Thr Gly Tyr Ser His Ser Thr Trp Lys 
85 90 95 

Phe Glu Arg Pro Arg Leu Tyr Thr Met Arg Gly Glu 
100 105 

Ser Lys Leu Phe Xaa Pro Asp Thr Leu 
110 115 



15 

Claims 

1. A method for preparing a method for surmising the functional site or region in a physiologically active 
polypeptide whose amino acid sequence is known (test polypeptide), which comprises 
20 (1) selecting, as the reference polypeptides, a plurality of known physiologically active peptides whose 

action site amino acid is known; supposing, as the functional regions in the known amino acid sequence 
of these reference polypeptides (reference amino acid sequence), the suposed functional regions which 
consist of a plurality of amino acids in both downstream and upstream of the active site amino acid, 
and the supposed non-functional region excluding a functional region; 
25 (2) supposing many amino acid sequence patterns (referred to as supposed amino acid sequence pat- 

terns) conforming to the following formula (I) consisting of two or more amino acids selected from 20 
amino acids usually existing in natural physiologically active polypeptides: 

Xi(Z)„X 2 (I) 

. (wherein X, and X 2 are the same or different, and are each any specific amino acid, are the same 
30 or different unspecific amino acids, n is 0 or an integer of up to about 30, and Xi is here referred to as 

a reference amino acid); 

(3) scanning, by each of the many supposed amino acid sequence patterns of the above (2), the refer- 
ence amino acid sequences selected in the above (1) with the supposed functional region and/or the 
supposed non-functional region; selecting supposed amino acid sequence patterns having a high fre- 

35 quency in accordance with the supposed functional region and/or supposed amino acid sequence pat- 

terns having a high frequency in accordance with the supposed non-functional region; and with respect 
to each supposed region, superposing, the supposed amino acid sequence patterns which have same 
reference amino acid X-! wherein the reference amino acid X! is rearranged to determine an amino acid 
sequence pattern related to the supposed functional region (supposed functional region-related amino 

40 acid sequence pattern) or an amino acid sequence pattern related to the supposed non-functional reg- 

ion (supposed non-functional region-related amino acid sequence pattern) (both are all-inciusively 
referred to as a related amino acid sequence pattern); and 

(4) determining symmetrical related amino acid sequence patterns consisting of the supposed func- 
tional and non-functional region-related amino acid sequence patterns determined in the above (3), 

45 which have a small number of amino acids extending symmetrically in the direction upstream and down- 

stream of the reference amino acid X! as a center comparing the symmetrical related amino acid sequ- 
ence pattern: with the reference amino acid sequences selected in the above (1) by superposing; 
determining the number of amino acids between both sequences in the range of the predetermined num- 
ber of amino acids containing the amino acid as the center (identical amino acids), and comparing th 

so results with the supposed functional region and/or supposed non-functional region in the reference 

amino acid sequences, to prepare a criterion for the determination of the functional sites or regions. 



2. A method for identifying th functional sit orr gionofapolyp ptidewhos amino acid sequenc is known 
but whose functional site or region is not known (test polypeptide), which comprises comparing the amino 

55 acid sequence of the test polypeptide with the symmetrical related amino acid patterns detenmin d in th 

(4) of claim 1, and applying to the results the criterion determined in the (4) of claim 1. 

3. A method for preparing a m thod for surmising a functional site or region in a physiologically active 
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polypeptide whose amino acid sequence is known (test polypeptide), which comprises 

(1) selecting, as th reference polyp p tides, a plurality of known physiologically active peptides whos 
action site amino acid is known; supposing, as the functional regions in the known amino acid sequence 
of these reference polypeptides (reference amino acid sequence), the supposed functional regions 
which consist of a plurality of amino acids in both downstream and upstream of the active site amino 
acid, and the supposed non-functional region excluding a functional region; 

(2) supposing many amino acid sequence patterns (referred to as secondary supposed amino acid 
sequence patterns) conforming to the following formula (II): 

X,'(Z) m X 2 ' (II) 

(wherein X t ' and X 2 ' are the same or different, and are each any specific amino acid, Z, are the sam 
or different unspecific amino acids, and m is 0 or an integer of up to about 30, wherein, X 2 ' is the same 
with X t \ or any of the amino acid supposed when the genetic codon encoding X/ were read in the 
reverse direction of 3' -> 5', the amino acid supposed when the antisense corresponding to the codon 
was read in the direction of 5' — > 3' and the amino acid supposed when the antisense corresponding 
to the codon was read in the reverse direction of 3' -> 5'); and 

(3) carrying out multivariate analysis for inducing a discriminant function from the reference amino acid 
sequences to distinguish the supposed functional region from supposed non-functional region of the 
above (1) by the secondary supposed amino acid sequence patterns determined in the above (2). 

A method for surmising the functional site or region in a physiologically active polypeptide whose amino 
acid sequence is known (test polypeptide), which comprises applying the discriminant function determined 
in the step (3) of claim 3 to the amino acid sequence of the test polypeptide. 

A method for surmising a site or a region in a polynucleotide, encoding the functional site or region in a 
physiologically active polypeptide, which comprises applying the operations described in claim 1 or 2 to 
an amino acid sequence translated. from the nucleotide sequence of the polynucleotide. 

A method for surmising a site or a region in a polynucleotide, encoding the action site or region in a 
physiologically active polypeptide, which comprises applying the operations described in claim 3 or 4 to 
an amino acid sequence translated from the nucleotide sequence of the polynucleotide. 

A synthetic polypeptide composed using at least two successive amino acids in the amino acid sequence 
of the functional site or region surmised by claim 2. 

A synthetic polypeptide using at least two successive amino acids in the amino acid sequence of the func- 
tional site or region surmised by claim 4. 

A synthetic polynucleotide using at least six successive nucleotides in the nucleotide sequence of the site 
or region of a polynucleotide surmised by claim 5. 

A synthetic polynucleotide composed using at least six successive nucleotides in the nucleotide sequence 
of the site or region of a polypeptide surmised by claim 6. 

A method of preparing a polypeptide or polynucleotide comprising performing a method according to any 
of claims 1-6 in order to surmise a sequence of a functional site or region of a physiologically active 
polypeptide or polynucleotide; and synthesising a polypeptide or a polynucleotide including at least part 
of said sequence, or a polynucleotide encoding at least part of said polypeptide sequence. 
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